That’s why we are playing around with RMarkdown
today.
Clearly, there’s no best way of doing so than throwing in a bunch of
Cats.
RMarkdown is the easiest way to create interactive documents integrating text, code and output from your code. It is fairly versatile, has a shallow learning curve and, as it often happens in R, there is a bunch of people continuously expanding its functionalities and possibilities.
For instance, you can:
Yes, you guessed that. This very document has been generated using
RMarkdown.
Not all that glitters is gold, though. Expect headaches when
setting up your (especially Windows) machine to work if interested in
knitting pdfs or more advanced stuff.
Source: https://bookdown.org/yihui/rmarkdown/
The document format “R Markdown” was first introduced in the knitr
package (Xie 2015, 2020c) in early 2012. The idea was to embed code
chunks (of R or other languages) in Markdown documents.
However, the original version of Markdown invented by John Gruber was often found overly simple and not suitable to write highly technical documents. For example, there was no syntax for tables, footnotes, math expressions, or citations. Fortunately, John MacFarlane created a wonderful package named Pandoc (http://pandoc.org) to convert Markdown documents (and many other types of documents) to a large variety of output formats. More importantly, the Markdown syntax was significantly enriched. Now we can write more types of elements with Markdown while still enjoying its simplicity.
In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on).
The rmarkdown package (Allaire, Xie, McPherson, et al. 2020) was first created in early 2014.
Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. Created by John Gruber in 2004, Markdown is now one of the world’s most popular markup languages.
It’s the language used to create README’s and Project descriptions in GitHub
Latex which is pronounced «Lah-tech» or «Lay-tech» (to rhyme with «blech» or «Bertolt Brecht»), is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents but it can be used for almost any form of publishing.
LaTeX is not a word processor! Instead, LaTeX encourages authors not to worry too much about the appearance of their documents but to concentrate on getting the right content.
You need LaTex only for knitting to pdf documents. If you’re happy with html notebooks, there’s no need of installing it.
If you need to convert files from one markup format into another, pandoc is your swiss-army knife.
The good news is that your RStudio IDE has already a Pandoc installation embedded! Maybe it’s not the latest, but it will work just fine most of the times.
Almost. Fire up your RStudio and install RMarkdown
first.
install.packages("rmarkdown")
Time to start your first RMarkdown document.
File -> New File -> R Markdown
We’ll stick to html today. Knitting documents to pdf is undoubtedly
cooler, but requires installing Latex, which might be tricky, depending
on the machine one is using.
Congratulations - You’ve just created your first
RMarkdown document.
You should be seeing something like this:
The first part is called the metadata.
The metadata is written between the pair of three dashes — The syntax
for the metadata is YAML (YAML Ain’t Markup Language, https://en.wikipedia.org/wiki/YAML), so sometimes it is
also called the YAML metadata or the YAML frontmatter. Before it bites
you hard, we want to warn you in advance that indentation matters in
YAML, so do not forget to indent the sub-fields of a top field
properly.
(Source - https://bookdown.org/yihui/rmarkdown/basics.html).
It sounds intimidating. However, you won’t have to do much with your
metadata most of the time, besides copy pasting it from a template.
Pheeeewww!
In the rest of the script you can distinguish between text and
chunks of code. Note how chunks of code are enclosed by three
backtick signs.
```
If you can’t find the backtick sign in your keyboard, try with the
ASCI code: Alt+96.
Look at the two buttons highlighted by the arrows.
knit my notebook?Knitting your RMarkdown script means
rendering it to your chosen output (html in this case). There is quite a
lot of machinery (=dark magic) happening behind the scenes. Fortunately,
for most applications we don’t have to understand how these happen. Just
enjoy the result.
Workflow - Source: ‘https://bookdown.org/yihui/rmarkdown-cookbook’
RMarkdown document?Let’s start with the easiest. How to add text
(=narrative), to my RMarkdown. Easy-peasy.
Just type!
You just need to know a couple of things:
Going to new line -> Need to add two whitespaces at the end
of a block of text.
Leaving an empty line -> Two ways: 1. <br>,
2.\newline (followed by an empty line).
Titles and Headers:
# Header - Header 1
## Header - Header 2
### Header - Header 3
Basic formatting. You can use some basic markdown formatting to make
your text:
Italic: *Felis catus* –> Felis catus
Italic: _Felis catus_ –> Felis catus
Bold: **Felis catus** –> Felis catus
Bold: __Felis catus__ –> Felis catus
Both: ***Felis catus*** –> Felis
catus
Both: ___Felis catus___ –> Felis catus
Adding Lynx:
[Eurasian Lynx - Wikipedia](https://en.wikipedia.org/wiki/Eurasian_lynx)
Renders as:
Eurasian Lynx -
Wikipedia
Embed a cat image from url/file:
<center>
{height=300px}
</center>
Renders as:
Black Footed Cat
Create numbered lists of cats (don’t forget to leave an empty line
before starting the list). E.g.,
My favourite wildcats:
1. Andean Cat (*Leopardus Jacobita*)
2. Rusty Spotted Cat (*Prionailurus Rubiginosus*)
3. Chinese Mountain Cat (*Felis Bieti*)
4. Kodkod (*Leopardus Guigna*)
Renders as:
Create unordered lists of cats:
Where small cats live:
* Small Cats of South America+ Andean Cat+ Geoffroy’s Cat+ Jaguarundi+ ...* Small Cats of SE Asia+ Leopard Cat+ Marbled Cat+ Fishing Cat+ ...Renders as:
Do Yourself a Favour - and download the catsheet
(=cheatsheet)
Catsheet
- https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
To insert a chunk of code, just enclose it between three backticks,
followed by {r}:
```{r}
4+4
```
It will render as:
4+4
## [1] 8
Let’s make some practice. We need to import some cat-related data, first.
big.cats <- read.table("data/Wikipedia_LargestCats.txt", header = T, sep="\t")
big.cats
## Rank Common.name Scientific.name Weight.range.kg
## 1 1 Tiger Panthera tigris 90-300
## 2 2 Lion Panthera leo 160-270
## 3 3 Jaguar Panthera onca 56-120
## 4 4 Cougar Puma concolor 53-100
## 5 5 Leopard Panthera pardus 17-90
## 6 6 Cheetah Acinonyx jubatus 20-60
## 7 7 Snow leopard Panthera uncia 22-55
## 8 8 Eurasian lynx Lynx lynx 15-45
## 9 9 Sunda clouded leopard Neofelis diardi 12-26
## 10 10 Clouded leopard Neofelis nebulosa 11.5-23
## Maximum.weight.kg Maximum.length.m Native.range.by.continent
## 1 388.78 4.17 Asia
## 2 375.00 3.64 Asia, Africa, Europe
## 3 160.00 2.60 North and South America
## 4 125.20 2.80 North and South America
## 5 96.50 2.75 Asia, Africa, Europe
## 6 72.00 2.10 Africa, Asia, Europe
## 7 75.00 2.50 Asia
## 8 38.00 1.50 Asia, Europe
## 9 27.00 1.30 Asia
## 10 23.00 1.08 Asia
This data.frame contains the weight range, and the maximum observed
weights and lengths of the ten largest wildcats. (Source: Wikipedia).
We then load tidyverse a set of powerful packages for
data manipulation and visualization.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
When loading tidyverse, we are getting a bunch of warning messages.
Not so nice in a report. You can deactivated them by opening your chunk
with {r, warning=F, message=F}.
Let’s do some real R code. In the chunk below, we split the weight
range field into min and max using some fancy dplyr code,
from tidyverse.
big.cats <- big.cats %>%
separate(Weight.range.kg, into=c("Weight.min", "Weight.max"), sep = "-", remove = T) %>%
mutate(Weight.min=as.numeric(Weight.min),
Weight.max=as.numeric(Weight.max)) %>%
mutate(Common.name=factor(Common.name, levels=big.cats$Common.name))
Maybe in your report you want to automatically include a value which
you calculate in your r script. This can be done with some inline
code. This can be done enclosing some code by backticks and
specifying the code is in r. For instance the code:
The cat having the highest weight is `r big.cats[1,'Common.name']`.
It weights up to `r big.cats[1,'Maximum.weight.kg']`.
Renders as:
The cat having the highest weight is Tiger.
It weights up to 388.78.
A couple of useful arguments when you start your chunk:
{r echo=F} — For running your code in the background,
without showing the code itself.
{r eval=F} — Opposite. For showing your code, but without
actually running it.
THE THING about RMarkdown is that it
allows embedding graphs directly to your document. Input changed? Just
re-knit and you’ll have all your graphs updated. It’s as easy as simply
running a chunk of code.
Let’s make a graph to see the differences in cat size more easily,
through a forest plot. We install the fantastically useless package
cat first. It’s not on CRAN, therefore we need also the
package remotes. All it does is randomly selecting a cat
image to be used as background of our ggplot graphs.
install.packages("remotes")
remotes::install_github("hilaryparker/cats")
We can then load the package and use it to get our much needed random cat image
library(cats)
ggcats <- ggplot(data=big.cats) +
cats::add_cat() + ## add a random cat image on the background of the graph, if you fancy
geom_segment(aes(y=Common.name, yend=Common.name, x=Weight.min, xend=Weight.max),
arrow = arrow(length = unit(5, "points"),
ends="both", type = "closed", angle = 40)) +
ylab(NULL) +
xlab(NULL) +
theme(axis.text = element_text(size=14))
ggcats
Some useful figure related arguments here:
{r fig.height=3} - picture height in inches
{r fig.width=3} - picture width in inches
{r fig.cap="add caption here"} - Add a caption
{r fig.align="center"} - Horizontal alignment of your
graph
{r dpi=150} - Change resolution of output image (mostly
relevant for pdf)
{r, fig.height=3, fig.width=4, fig.cap="Weight (kg) of the 10 largest wild cats",
fig.align="center", dpi=150, echo=F}Weight (kg) of the 10 largest wild cats
Note how the code isn’t visible anymore, having set
echo=F. I swear it’s there, though.
If you just print an R object on the console as we did above,
RMarkdown, will show it in the rendered document as well.
It won’t look that good, though.
The entry level way of rendering tables is the
knitr::kable function.
knitr::kable(head(big.cats[1:4,1:5]), caption="The largest cats!")
| Rank | Common.name | Scientific.name | Weight.min | Weight.max |
|---|---|---|---|---|
| 1 | Tiger | Panthera tigris | 90 | 300 |
| 2 | Lion | Panthera leo | 160 | 270 |
| 3 | Jaguar | Panthera onca | 56 | 120 |
| 4 | Cougar | Puma concolor | 53 | 100 |
An even nicer way is using the kableExtra package. For
instance, when rendering to html, kableExtra allows the
creation of responsive tables. This is extremely useful for tables
larger than a A4 format.
library(kableExtra)
knitr::kable(big.cats, caption="The largest cats!") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
position = "center")
| Rank | Common.name | Scientific.name | Weight.min | Weight.max | Maximum.weight.kg | Maximum.length.m | Native.range.by.continent |
|---|---|---|---|---|---|---|---|
| 1 | Tiger | Panthera tigris | 90.0 | 300 | 388.78 | 4.17 | Asia |
| 2 | Lion | Panthera leo | 160.0 | 270 | 375.00 | 3.64 | Asia, Africa, Europe |
| 3 | Jaguar | Panthera onca | 56.0 | 120 | 160.00 | 2.60 | North and South America |
| 4 | Cougar | Puma concolor | 53.0 | 100 | 125.20 | 2.80 | North and South America |
| 5 | Leopard | Panthera pardus | 17.0 | 90 | 96.50 | 2.75 | Asia, Africa, Europe |
| 6 | Cheetah | Acinonyx jubatus | 20.0 | 60 | 72.00 | 2.10 | Africa, Asia, Europe |
| 7 | Snow leopard | Panthera uncia | 22.0 | 55 | 75.00 | 2.50 | Asia |
| 8 | Eurasian lynx | Lynx lynx | 15.0 | 45 | 38.00 | 1.50 | Asia, Europe |
| 9 | Sunda clouded leopard | Neofelis diardi | 12.0 | 26 | 27.00 | 1.30 | Asia |
| 10 | Clouded leopard | Neofelis nebulosa | 11.5 | 23 | 23.00 | 1.08 | Asia |
I won’t look good that good on a pdf, but on an html report it’s just as good as it can possibly be.
Yes, it is possible to add references to your markdown document.
There are multiple ways for doing it. You
can even link your Zotero library, if you wish. However, the easiest
is probably to use the knitcitations package. We load it
first.
install.packages("knitcitations")
library(knitcitations)
We can now cite online any work simply by referring to its doi. How?
The text:
Cats are not necessarily animals `r citep('10.1007/s10670-022-00588-w')`. But if they are, they should be left free to roam `r citep('10.1007/s12136-019-00408-x')`
will render as:
Cats are not necessarily animals (Hermida, 2022).But if they
are, they should be left free to roam (Abbate, 2019)
To create a bibliography, we need to use the respective
command.
bibliography()
## [1] C. Abbate. "A Defense of Free-Roaming Cats from a Hedonist Account
## of Feline Well-being". In: _Acta Analytica_ 35.3 (ott. 2019), pp.
## 439-461. DOI: 10.1007/s12136-019-00408-x.
## <https://doi.org/10.1007/s12136-019-00408-x>.
##
## [2] M. Hermida. "Cats are not necessarily animals". In: _Erkenntnis_
## (ago. 2022). DOI: 10.1007/s10670-022-00588-w.
## <https://doi.org/10.1007/s10670-022-00588-w>.
…and yes, you can also change the format style, but it is a bit laboursome and we don’t deal with this aspect here.
Whenever you knit your RMarkdown report, R will re-run
all the code contained in your .Rmd script. It’s therefore not the best
idea to include a chunk of code taking 3 hours to run. If you do so,
even just correcting a typo in the text will require you to wait three
hours before the corrected version of your report is rendered!
There are a couple of workarounds, though.
add the argument {r cache=T} to your chunk:
Adding this argument to the slowest chunks of code will save the
intermediate results of these chunks in a dedicated folder.
RMarkdown will only rerun these chunks when
changed. Otherwise, it will skip the chunk, and get directly the cached
results.
{r eval=F} + save and load
Sometimes, it might be more convenient to run slow chunks of code in a
dedicated, interactive session (maybe even on a different machine), save
the results in a .RData file, and reload this saved data in
your report. For transparency, you might still show the code you used to
produce these intermediate outputs, but setting eval=F, you’ll tell
RMarkdown not to run this code.
Something like this.
Chunk 1 with slow code I ran elsewhere:
```{r eval=F}
output <- function(input) # A real SLOW chunk of code
save(output, filename='output.slow.RData')
```
Chunk 2 reimporting the output of chunk 1:
```{r}
load(filename='output.slow.RData')
```
Clicking on the knit button is convenient. The keyboard
shortcut Ctrl``Alt``k is even more.
Sometimes, you might want to knit multiple documents (Yes, you might
loop across several .Rmd scripts, and knit hundreds of
reports in parallel). To do so, you might want to knit an
.Rmd file from console as:
knitr::knit("KateRMarkdown.Rmd")
I know. You you can’t wait to produce your fantastic pdf reports, and
write your next paper directly in R.
Good news - is possible - There are many templates out there, and you
just have to fetch them. See for instance: https://t.co/uJBqWER5h6?amp=1
Yes, there are templates which will also format bibliographic
reference as requested by different journals
Bad news - you’ll need to setup your machine first, and it might be tricky sometimes. You find some guidance at: https://bookdown.org/yihui/rmarkdown-cookbook/install-latex.html.
Many good resources out there. I only cite two:
Now it’s up to you to create a beautiful RMarkdown
report full of cats. The more \ the cuter the cats, the better.
Pick up a cute, wild species of cat and:
RMarkdown projectrgbif
package (help code below)Help code to download data from gbif
library(tidyverse)
library(rgbif)
myspecies <- "Caracal caracal" ## example
get.speciesKey <- function(x){name_backbone(x)$speciesKey} #get GBIF species key
key <- get.speciesKey(myspecies)
# extract the first n occurrences from rgbif
get.occurrences <- function(x, n=100){occ_search(taxonKey=x, return="data",
limit=n, hasCoordinate = T)}
# clean data
dat <- lapply(key, get.occurrences, n=100)[[1]]
dat <- dat$data %>%
dplyr::select(species, year:day, country, stateProvince,
decimalLongitude, decimalLatitude)#, everything())
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8
## [3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
## [5] LC_TIME=Italian_Italy.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitcitations_1.0.12 kableExtra_1.3.4 cats_0.1
## [4] forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10
## [7] purrr_0.3.5 readr_2.1.3 tidyr_1.2.1
## [10] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
## [13] cowplot_1.1.1 hexSticker_0.4.9
##
## loaded via a namespace (and not attached):
## [1] fs_1.5.2 lubridate_1.9.0 webshot_0.5.4
## [4] httr_1.4.4 tools_4.2.2 backports_1.4.1
## [7] bslib_0.4.1 utf8_1.2.2 R6_2.5.1
## [10] DBI_1.1.3 colorspace_2.0-3 withr_2.5.0
## [13] tidyselect_1.2.0 curl_4.3.3 compiler_4.2.2
## [16] textshaping_0.3.6 cli_3.4.1 rvest_1.0.3
## [19] xml2_1.3.3 labeling_0.4.2 sass_0.4.4
## [22] scales_1.2.1 hexbin_1.28.3 systemfonts_1.0.4
## [25] digest_0.6.31 yulab.utils_0.0.6 svglite_2.1.1
## [28] rmarkdown_2.18 jpeg_0.1-10 pkgconfig_2.0.3
## [31] htmltools_0.5.4 showtext_0.9-5 bibtex_0.5.1
## [34] dbplyr_2.2.1 fastmap_1.1.0 highr_0.9
## [37] rlang_1.0.6 readxl_1.4.1 rstudioapi_0.14
## [40] sysfonts_0.8.8 gridGraphics_0.5-1 jquerylib_0.1.4
## [43] farver_2.1.1 generics_0.1.3 jsonlite_1.8.4
## [46] googlesheets4_1.0.1 magrittr_2.0.3 ggplotify_0.1.0
## [49] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
## [52] RefManageR_1.4.0 lifecycle_1.0.3 stringi_1.7.8
## [55] yaml_2.3.6 plyr_1.8.8 grid_4.2.2
## [58] crayon_1.5.2 lattice_0.20-45 haven_2.5.1
## [61] hms_1.1.2 magick_2.7.4 knitr_1.41
## [64] pillar_1.8.1 reprex_2.0.2 glue_1.6.2
## [67] evaluate_0.19 ggimage_0.3.1 ggfun_0.0.9
## [70] modelr_0.1.10 vctrs_0.5.1 tzdb_0.3.0
## [73] cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1
## [76] cachem_1.0.6 xfun_0.35 broom_1.0.1
## [79] viridisLite_0.4.1 ragg_1.2.4 googledrive_2.0.0
## [82] gargle_1.2.1 showtextdb_3.0 timechange_0.1.1
## [85] ellipsis_0.3.2